Not Quite Lost in Translation: Mistranslation Alters Adaptive Landscape Topography and the Dynamics of Evolution

Abstract Mistranslation—the erroneous incorporation of amino acids into nascent proteins—is a source of protein variation that is orders of magnitude more frequent than DNA mutation. Like other sources of nongenetic variation, it can affect adaptive evolution. We study the evolutionary consequences of mistranslation with experimental data on mistranslation rates applied to three empirical adaptive landscapes. We find that mistranslation generally flattens adaptive landscapes by reducing the fitness of high fitness genotypes and increasing that of low fitness genotypes, but it does not affect all genotypes equally. Most importantly, it increases genetic variation available to selection by rendering many neutral DNA mutations nonneutral. Mistranslation also renders some beneficial mutations deleterious and vice versa. It increases the probability of fixation of 3–8% of beneficial mutations. Even though mistranslation increases the incidence of epistasis, it also allows populations evolving on a rugged landscape to evolve modestly higher fitness. Our observations show that mistranslation is an important source of nongenetic variation that can affect adaptive evolution on fitness landscapes in multiple ways.

: Comparison of the fitness values at the end of adaptive walks on all three adaptive landscapes. Mean (x) and standard deviation (σ ) of these fitness values are reported as a percentage of the maximum fitness in each landscape. For each combination of landscapes and population sizes N, we perform Kruskal-Wallis tests to determine if any significant differences exist between adaptive walks under three conditions. These conditions are without mistranslation, with mistranslation and low expression (one protein per cell), or with mistranslation and high expression (500 proteins per cell). We report the resulting H statistic and p-value. Each set of conditions has 10 4 replicates. Pairwise comparisons using Dunn's post-hoc test with Bonferroni correction of the significance of differences between conditions are given in table S2.  Table S2: Pairwise comparison of the significance of differences between the fitness values at the end of adaptive walks without mistranslation, with mistranslation and low expression (one protein per cell), and with mistranslation and high expression (500 proteins per cell). All comparison are done between walks on the same landscape at the same population size N, and with 10 4 replicates. We use Dunn's test with Bonferroni correction to assess significance.    Figure S1: Mistranslation flattens the fitness distribution of the toxin-antitoxin (E3) landscape, and changes the fitness effects of some mutations. (caption continued next page) Figure S1: (caption continued) A) Distribution of the changes in fitness due to mistranslation of 10 4 randomly chosen genotypes. B) The mean fitness of a genotype's phenotypic neighbours (polypeptide sequences that differ in one amino acid position from the genotype, vertical axis) and the mean fitness of its genotypic neighbours (genotypes with a single nucleotide change, horizontal axis) are positively correlated. C) Mistranslation reduces the effective population size. White diamonds and lines show the median effective population size and the standard deviation, respectively. The violin plots show a Gaussian kernel density estimate of the distribution of effective population sizes. D) Changes due to mistranslation in the proportions of mutations classified according to their fitness effects as beneficial, neutral (including nearly neutral [1,2]), or deleterious. E) Distributions of the absolute differences in mean fitness between both synonymous (green) and nonsynonymous (brown) pairs of genotypes sampled from the toxin-antitoxin (E3) landscape in the presence of mistranslation at an expression level of one protein per cell. Only nonzero fitness differences are shown. F) Percentage of mutations classified as neutral in the presence of mistranslation for multiple combinations of the population size N and the protein expression level, both of which impact the effective population size N e . G) Distribution of the change in fixation probabilities of beneficial mutations due to mistranslation. Data is shown only for those mutations that are beneficial both with and without mistranslation. The horizontal axis shows the difference in fixation probability due to mistranslation, with zero denoting no change. For example, changes in fixation probability close to negative one signify mutations that are almost certain to fix without mistranslation (u f ix ≈ 1), but have almost no chance of reaching fixation with mistranslation (u f ix ≈ 0). In all panels, results are shown for the same set of 10 4 randomly chosen genotypes from the toxin-antitoxin (E3) landscape, and randomly chosen one-step mutational neighbours, at a population size of N = 10 6 , and an expression level of one protein per cell. Data is shown only for those mutations that are beneficial both with and without mistranslation. The horizontal axis shows the difference in fixation probability due to mistranslation, with zero denoting no change. For example, changes in fixation probability close to negative one signify mutations that are almost certain to fix without mistranslation (u f ix ≈ 1), but have almost no chance of reaching fixation with mistranslation (u f ix ≈ 0). In all panels, results are shown for the same set of 10 4 randomly chosen genotypes from the toxin-antitoxin (E2) landscape, and randomly chosen one-step mutational neighbours, at a population size of N = 10 6 , and an expression level of one protein per cell.  Our model of mistranslation begins with a nucleotide sequence encoding part of the focal protein, 3 and the mistranslation rates of all the codons within the nucleotide sequence, i.e. the rates at which 4 translation incorporates one of the noncognate amino-acids instead of the cognate amino acid. 5 We assume that the probability of correct translation is one minus the sum of all mistranslation 6 rates. For every polypeptide sequence of length L encoded by a nucleotide sequence of length 3L, 7 (mis)translation produces from this nucleotide sequence one of k = 20 L protein variants with a 8 unique polypeptide sequence. 9 Any one cell that expresses the focal protein will harbour multiple variants of this protein, 10 one resulting from correct translation, and all others from mistranslation. We estimate the number 11 of each mistranslated variant in the total pool of n copies of the focal protein in the cell through a 12 multinomial distribution with n 'trials'. Each trial corresponds to the biosynthesis of a protein, and 13 can have one of k possible outcomes -a protein variant. Each protein variant i (with i = 1, . . . , k) 14 has its own probability of being produced p i , which is determined by the mistranslation rates 15 of the encoding nucleotide sequence. We assume that the mistranslation rate at each codon is 16 independent from that at other codons, because experimental measurements suggest that most 17 translation errors are due to codon-anticodon mispairing in the ribosome [3,4]. Consequently, 18 the probability p i of producing a given protein variant i is the product of the probabilities that 19 translation incorporates the amino acids that make up the protein's sequence, given the encoding 20 nucleotide sequence. More specifically, the multinomial distribution for the number of copies x i 21 of each protein variant i, . . . , k is given by

Supplementary figures
A single cell producing n copies of the focal protein will contain a tiny fraction of all possible 23 protein variants k. However, large populations of such cells will produce a larger fraction of 24 variants. Over multiple generations or evolutionary time-scales, even rare protein variants will 25 appear. Consequently, for the purpose of deriving our model we assume that all protein variants k 26 may be produced from a given nucleotide sequence. In practice, some of these variants will have 27 such low probability of being produced through mistranslation that they have no influence on the 28 fitness or evolution of a population, and in our simulations we ignore these variants (see below).

29
With this multinomial distribution we then calculate the expected number of copies of each The expected number of copies E(x i ) of protein variant i produced in a cell is and its variance is We calculate these two quantities for the number of copies of each possible protein variant 36 x 1 , . . . , x k . Because the number of proteins produced per cell is finite, the production of a copy of 37 one protein variant decreases the probability that a cell will contain a copy of another protein Cov Using the multinomial distribution of a cell's protein composition together with experimental we assume that every protein copy contributes equally to fitness, and that the fitness contribution w i of copy i is independent of that of all other copies produced. The fitness of a cell then becomes 48 the mean of the fitness contributions of all copies it produces. In consequence, the expected 49 fitness of a cell E( f ) is given by where n is the total number of proteins per cell, k the number of all possible protein variants, p i Cov(w i /n · x i , w j /n · x j ) = −n · p i · p j · w i /n · w j /n. Third, the variance of the sum of two random During our simulations we save computational resources by ignoring the contributions of all 71 protein variants whose probability of being produced by translation was less than 10 −9 when 72 computing the mean fitness of cells and its variance. Following the literature in the field [5][6][7], we define a phenotypic mutation as the mistranslation 75 of a specific codon into a specific (wrong) amino acid. We use experimentally estimated mistrans-76 lation rates from a previous study relying on proteome-wide mass-spectrometry in E. coli [4].

77
The study computed a ratio of mistranslated to correctly translated proteins. This 'intensity ratio' 78 provides a direct estimate of the mistranslation rate per codon, which is reliable for common (but 79 not for rare) mistranslation errors [4].

80
As an estimate for the rate at which any one phenotypic mutation occurs, we use the median 98 Figure S5: Interpolation of mistranslation rates using data from [4]. A) Correlation of the experimental mistranslation rate estimates of a given phenotypic mutation (vertical axis) with how many times the pertinent phenotypic mutation is observed (horizontal axis, each sample is an observation). Black crosses indicate median mistranslation rate estimates, and grey lines the corresponding 95 percent confidence intervals. The blue line shows the best linear regression fit when using all data, and the red line when all mistranslation rate estimates with less than 18 samples are ignored. B) Effect of different thresholds for exclusion (all mistranslation rate estimates with fewer samples than the threshold are ignored) on the intercept (black line) of the linear regression of the number of samples with mistranslation rate estimates (lines in panel A), and on the percentage of observations excluded from the regression analysis.

Landscape data 99
We draw on three experimentally determined landscapes for all our analyses, as described in 100 more detail below. The first of these is an antibody-binding landscape [8]. The second and 101 third are toxin-antitoxin fitness landscapes that quantify the fitness of E. coli cells in which 102 variants of an antitoxin bind to a toxin protein [9]. All landscapes are represented at the level 103 of amino acids, because the nucleotide sequences encoding these amino acids have not been 104 reported [8,9]. We simulate evolution at the level of nucleotide sequences, so we map each 105 nucleotide sequence to the amino acid it encodes. We chose these landscapes for two reasons.  In our analysis, a fitness peak can either consist of a single genotype or a network of multiple 149 genotypes that are connected by single mutations, and in which genotypes that differ in a single 150 mutation are nearly neutral [2], i.e. mutations with such small fitness effects that they effectively evolve by genetic drift. We define a nearly neutral mutation as a mutation with a selection 152 coefficient |s| < 1 4N e relative to the wild-type [1]. We consider such a genotype or network of 153 genotypes a peak if all its single-mutation, non-neutral neighbours have lower fitness.

154
To search for fitness peaks in our three landscapes, we choose a set of 10 5 genotypes with 155 the highest fitness as seeds for the search. For each seed, we examine whether every one of 156 the one-step mutational neighbours of the seed is nearly neutral or not. We test for neutrality the network to constitute a fitness peak. 169 We estimate the incidence of epistasis by sampling 10 4 genotypes from each of our land-170 scapes. For each of these starting genotypes, we create a 'square' of four genotypes (see also 171 supplementary figure S6). To do so, we choose two sites at random in the starting genotype (which 172 we denote as 00) and replace either one or both sites with alternative nucleotides, generating 173 an additional three nucleotide sequences for each starting genotype. Two of these sequences 174 are single mutants (01 and 10), and one a double mutant (11). We quantify any deviation from 175 additivity by using the linear combination of the fitness w of each genotype [14]: Any square that deviates from additivity (ε ̸ = 0) is epistatic. To differentiate between magnitude epistasis, simple sign epistasis, or reciprocal epistasis, we assess whether a given mutation causes In the main text we observe that mistranslation greatly reduces the number of nearly neutral 185 mutations. This means that mistranslation is likely to disrupt networks of nearly neutral mutants.

186
Here we quantify the extent of this disruption.

187
In order to find nearly neutral networks, we use the same method we use to find nearly 188 neutral networks that are fitness peaks, except that we now consider such networks throughout   For most of this contribution, we assume that the empirical fitness measurements f 1 are free of 295 the effect of mistranslation and simulated the effect of mistranslation on these measurements to 296 calculate fitness with mistranslation f 2 . We will now estimate the true fitness of these genotypes 297 f 0 from the empirical measurements f 1 . Because the available data report fitness f 1 at the level of 298 amino acids and not nucleotide sequences, we can only approximate the effect of mistranslation 299 on fitness measurements when trying to infer f 0 . 300 We estimate mistranslation rates at the level of amino acid sequences, using experimentally 301 measured mistranslation rates for each codon that encodes a specific amino acid sequence [4], and 302 averaging the mistranslation rate over all such codons. With these averaged mistranslation rates 303 we estimate the probability that translation produces either the encoded amino acid sequence or 304 an alternative sequence. For every amino acid sequence in the antibody-binding landscape, we 305 calculate the probability of translation producing the encoded amino acid sequence, or all other 306 amino acid sequences in the landscape. We place these probabilities into a matrix A of size n × n, 307 where n = 149361 is the number of amino acid sequences in the antibody-binding landscape. We 308 then infer f 0 by solving the system of linear equations f 0 = A f 1 .

309
The inferred fitness f 0 has a mean of 8.15 × 10 −2 ± 0.402, which is significantly different